1 Abstract

Air pollution is one of the major concern in today’s world. It can have very serious cost, penalites and consequences for the health of human beings and also ruthlessly distresses natural bio-network and ecosystems.Out of five major air pollutants (round-level ozone, particle pollution also known as particulate matter, carbon monoxide, sulfur dioxide, and nitrogen dioxide) WHO has identified SPM (Suspended Particulated Matter) as most sinister in terms of its effects on health. You may experience health issues such as Irregular heartbeat,Respiratory symptoms like coughing, wheezing or difficulty breathing etc. Therefore measuring these pollutants on daily basis and forecast is vital for any government. Daily measuring and forecasts can help Governments in designing policies to control air pollution and issue a guidance to citizens of possible health impacts and precautions to be taken if air quality becomes bad or predicted to be bad on next day.

2 Intro to Dataset

With this little background, for academic purposes we have decided to use Air Quality dataset from UCI machine learning respository for time series term project. This data set includes hourly air pollutants data from 12 nationally-controlled air-quality monitoring sites. The air-quality data are from the Beijing Municipal Environmental Monitoring Center. The meteorological data in each air-quality site are matched with the nearest weather station from the China Meteorological Administration. The time period is from March 1st, 2013 to February 28th, 2017.

This hourly data set considers 6 main air pollutants and 6 relevant meteorological variables at multiple sites in Beijing.

SrNo Air Pollutants Description
1 PM2.5 PM2.5 concentration (ug/\(m^{3}\))
2 PM10 PM10 concentration (ug/\(m^{3}\))
3 SO2 Sulphur Dioxide concentration (ug/\(m^{3}\))
4 NO2 Nitrogen Dioxide concentration (ug/\(m^{3}\))
5 CO Carbon Monoxide concentration (ug/\(m^{3}\))
6 O3 Ozone concentration (ug/\(m^{3}\))
7 TEMP temperature (degree Celsius)
8 PRES pressure (hPa)
9 DEWP dew point temperature (degree Celsius)
10 RAIN precipitation (mm)
11 wd wind direction
12 WSPM wind speed (m/s)

Goal
Goal of this project is to develop a model to forecast on all major air pollutants PM2.5,PM10,SO2,NO2,CO and O3 at least for next 10 days and effect of these pollutants on temperature and rain (Multivariate Analysis)

This dataset contains observations from 12 Chinese cities. For initial analysis (EDA) we are using observations from the city called Dongsi and it will be extended to two more cities for the purpose of the project and will try fit best possible model based on analysis.

Following are city names :

SrNo City
1 Aoizhongxin
2 Changping
3 Dingling
4 Dongsi
5 Guanyuan
6 Gucheng
7 Huairou
8 Nongzhanguan
9 Shunyi
10 Titantan
11 Wanliu
12 Wanshouxigong

Let’s use data from the city called Dongsi.

3 Analysis of Dataset

3.1 Structure

## 'data.frame':    35064 obs. of  18 variables:
##  $ No     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ year   : int  2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
##  $ month  : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ day    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ hour   : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ PM2.5  : num  9 4 7 3 3 4 5 3 3 3 ...
##  $ PM10   : num  9 4 7 3 3 4 5 6 6 6 ...
##  $ SO2    : num  3 3 NA 5 7 9 10 12 12 9 ...
##  $ NO2    : num  17 16 17 18 NA 25 29 40 41 31 ...
##  $ CO     : int  300 300 300 NA 200 300 400 400 500 400 ...
##  $ O3     : num  89 88 60 NA 84 78 67 52 54 69 ...
##  $ TEMP   : num  -0.5 -0.7 -1.2 -1.4 -1.9 -2.4 -2.5 -1.4 -0.3 0.4 ...
##  $ PRES   : num  1024 1025 1025 1026 1027 ...
##  $ DEWP   : num  -21.4 -22.1 -24.6 -25.5 -24.5 -21.3 -20.4 -20.4 -21.2 -23.3 ...
##  $ RAIN   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ wd     : Factor w/ 16 levels "E","ENE","ESE",..: 7 8 7 4 7 8 8 7 8 4 ...
##  $ WSPM   : num  5.7 3.9 5.3 4.9 3.2 2.4 2.2 3 4.6 5.5 ...
##  $ station: Factor w/ 1 level "Dongsi": 1 1 1 1 1 1 1 1 1 1 ...

Combine Day Month and Year fields into single Date field.

3.2 Missing Values

3.2.1 VIM::aggr()

This is aggregration plot for missing values

## Warning in plot.aggr(res, ...): not enough vertical space to display frequencies
## (too many combinations)

## 
##  Variables sorted by number of missings: 
##  Variable        Count
##        CO 0.0911761351
##       NO2 0.0456593657
##     PM2.5 0.0213894593
##        O3 0.0189368013
##       SO2 0.0189082820
##      PM10 0.0157711613
##      TEMP 0.0005703856
##      PRES 0.0005703856
##      DEWP 0.0005703856
##      RAIN 0.0005703856
##      WSPM 0.0003992699

3.2.2 VIM::marginplot()

marginplot shows distribution of missing values.

Interpretation

  • All marginplots below shows distribution of missing values overlapping with non-missing values
  • There is no specific pattern in missingness
  • values are missing completely at random.
  • Imputing missing values is advised.
  • ImputeTS to be used for imputing missing vales

3.3 Imputations

For imputation we will be using R package imputeTS . It has different methods depending on seasonality and trends to impute values for time series.

So let’s first plot the each series by excluding rows with NA values to check if there is trend or seasonality in the data and apply these methods [1] accordingly.

3.3.1 PM2.5 Series

Last observation carried forward Method used

Notice that the trend-cycle (in red) is smoother than the original data and captures the main movement of the time series without all of the minor fluctuations. The order of the moving average (351-MA) determines the smoothness of the trend-cycle estimate and it shows there is no deterministic trend present in the data. There appears to be wandering behavior in the data.

3.3.2 PM10 Series

Last observation carried forward Method used

3.3.3 SO2 Series

Seasonal Adjustment then LOCF

Removes the seasonal component from the time series, performs imputation on the deseasonalized series and afterwards adds the seasonal component again.

3.3.4 NO2 Series

Last observation carried forward Method used

3.3.5 O3 Series

Seasonal Adjustment then LOCF

3.3.6 CO Series

Seasonal Adjustment then LOCF

3.3.8 Final Dataset

Final dataset for this time series analysis is obtained by converting hourly observations to averaged daily observations. This data will then be used to 10 day forecast.

4 Smoothed Time series plots

4.1 PM2.5

4.2 PM10

4.3 SO2

4.4 NO2

4.5 Ozone

5 PM2.5 Analysis

Following analysis is done by using tswge R package that we have used exstensively this course.

5.1 Stationarity

Condition-1 Mean Does not depend on time

Visual evidence suggests, there is no constant downward or upward trend exist in the realization. Therefore mean appears to be constant.

Condition-2 Variance does not depend on time

Except one spike near 1000, there is no strong evidence of non-constant variance.

Condition-3 Mean Does not depend on time

acfs from two halves of the time series shows no significant difference, correlation of \(X_{t_1}\) and \(X_{t_2}\) depends only on
\(t_{2}\)-\(t_{1}\) and not where they are in time.

There is no evidence against stationarity. Time series is therefore stationary.

Other Observations

  • No visible trend and seasonality in realization
  • ACF cuts off after 2nd lag. No indication of non-stationarity
  • Spectral density plot shows peak at 0, indicating wandering behavior.
  • There is no strong visual evidence of any other frequency.
  • First and second half of ACFs shows no significant difference.
  • This is stationary series.

5.3 Dicky-Fuller Test

\(H_{0}\) - Unit root present
\(H_{1}\) - Unit root not present. or time series is stationary.

Conclusion

pvalue = 0.01. Reject Ho. Suggests strong evidence against presence of unit root.
No visual evidence of sesonality so time series is stationary.

## Warning in adf.test(tt_pm25_ts$train_ts): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  tt_pm25_ts$train_ts
## Dickey-Fuller = -8.9566, Lag order = 11, p-value = 0.01
## alternative hypothesis: stationary

5.4 Model ID

AIC suggests ARMA(5,1) is the best fitted model. Ljung box test was performed for p=5 and q=1.
BIC suggests ARMA(1,1) is the best fitted model. Ljung box test was performed for p=1 and q=1.

## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  bic

5.5 White Noise Test

ljung box test provides strong evidence againts null hypothesis that series is white noise. p-value=0.

## Obs 0.5530452 0.1958933 0.06262632 0.02729153 0.03122277 0.02546675 0.01851925 0.0480942 0.08306008 0.08808314 0.07028984 0.05264368 0.06927908 0.08282109 0.08081704 0.06200663 0.02933554 0.009845052 0.0107568 0.02610704 0.05958953 0.06321152 0.05334019 0.0786623
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 24
## 
## $chi.square
## [1] 601.5651
## 
## $df
## [1] 18
## 
## $pval
## [1] 0
## Obs 0.5530452 0.1958933 0.06262632 0.02729153 0.03122277 0.02546675 0.01851925 0.0480942 0.08306008 0.08808314 0.07028984 0.05264368 0.06927908 0.08282109 0.08081704 0.06200663 0.02933554 0.009845052 0.0107568 0.02610704 0.05958953 0.06321152 0.05334019 0.0786623
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 24
## 
## $chi.square
## [1] 601.5651
## 
## $df
## [1] 22
## 
## $pval
## [1] 0

5.6 ARMA

5.6.1 Build a Model

## 
## Coefficients of Original polynomial:  
## 1.5757 -0.7831 0.2080 -0.0472 0.0261 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-0.9653B              1.0360               0.9653       0.0000
## 1-0.8283B+0.2765B^2    1.4981+-1.1717i      0.5258       0.1056
## 1+0.2179B+0.0979B^2   -1.1125+-2.9955i      0.3129       0.3066
##   
## 
## 
## Coefficients of Original polynomial:  
## 0.3625 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-0.3625B              2.7585               0.3625       0.0000
##   
## 

5.6.2 Residual diagnositcs

Ljung box test suggests that residuals are white noise.So assumptions on residuals are met.

## Obs -0.0001881523 -0.001022132 -0.00171976 -0.001977746 0.008193721 -0.009010721 -0.04192617 -0.009892159 0.02172941 0.01829942 0.009902166 -0.02476423 0.01458044 0.0179067 0.01651688 0.01349824 -0.0140909 -0.02150647 -0.01368855 -0.02030182 0.02597591 0.01940536 -0.02380709 0.0370772
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 24
## 
## $chi.square
## [1] 12.79612
## 
## $df
## [1] 18
## 
## $pval
## [1] 0.8035395
## Obs -0.0001881523 -0.001022132 -0.00171976 -0.001977746 0.008193721 -0.009010721 -0.04192617 -0.009892159 0.02172941 0.01829942 0.009902166 -0.02476423 0.01458044 0.0179067 0.01651688 0.01349824 -0.0140909 -0.02150647 -0.01368855 -0.02030182 0.02597591 0.01940536 -0.02380709 0.0370772 0.0429581 -0.02662873 -0.01575458 0.0507779 0.02531561 0.01177715 -0.009671479 0.007327128 -0.02263186 -0.01164259 -0.01866861 0.02395932 -0.0429963 0.03045108 0.01068537 -0.006911302 0.04207008 -0.02639989 -0.03661697 -0.01178131 0.04600438 0.01048906 -0.01378056 -0.02356496
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 48
## 
## $chi.square
## [1] 39.04114
## 
## $df
## [1] 42
## 
## $pval
## [1] 0.6016005

## Obs 0.001234195 0.0009126953 -0.01275297 -0.004962188 0.01939087 0.01393892 -0.01562802 0.01613415 0.04602301 0.04062574 0.03087398 -0.006125254 0.03261754 0.03631462 0.03546942 0.03184008 0.002743961 -0.005825104 0.001725101 -0.004982238 0.04040599 0.03408505 -0.009793972 0.04797018
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 24
## 
## $chi.square
## [1] 23.24318
## 
## $df
## [1] 22
## 
## $pval
## [1] 0.3881086
## Obs 0.001234195 0.0009126953 -0.01275297 -0.004962188 0.01939087 0.01393892 -0.01562802 0.01613415 0.04602301 0.04062574 0.03087398 -0.006125254 0.03261754 0.03631462 0.03546942 0.03184008 0.002743961 -0.005825104 0.001725101 -0.004982238 0.04040599 0.03408505 -0.009793972 0.04797018 0.05148732 -0.01758795 -0.006874163 0.05939029 0.03483418 0.02273029 5.01446e-05 0.01640283 -0.01371569 -0.002239651 -0.01026027 0.03213572 -0.03538688 0.03799857 0.01787272 -0.000521831 0.0459199 -0.02343426 -0.03261477 -0.007585106 0.05106236 0.01597669 -0.008663508 -0.02062736
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 48
## 
## $chi.square
## [1] 52.86132
## 
## $df
## [1] 46
## 
## $pval
## [1] 0.2262255

## [1] 86.54716

5.6.3 ARMA Forecast

PM2.5 Model: As per BIC
(1 - 0.36B ) (\(X_t\) - 86.54) = (1 + 0.28B) (\(a_t\))

Final PM2.5 Model: As per AIC
(1 - 1.58B + 0.79\(B^2\) - 0.22\(B^3\) + 0.06\(B^4\) - 0.03\(B^5\)) (\(X_t\) - 86.54) = (1 - 0.93B) (\(a_t\))

Method ASE Window ASE \(\sigma^2\) AIC
AIC 3547.03 5654.971 3659.862 8.21
BIC 3388.96 5764.101 3685.13 8.21

From the above AIC method seems appropriate for forecasting due to lowest windowed ASE.

## [1] 5654.971

5.7 ARIMA

From Realization,ACFs and spectral analysis we determined that Time series is stationary. But if we look at full length time series we see that there is pattern repeats every year. PM2.5 leves are lowest and then it peaks up and then goes down again. So lets take into account that sesonality. i.e. let’s model assuming that time series is not a stationary.

5.7.1 Build a Model

## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
## 
## Coefficients of Original polynomial:  
## 0.6744 -0.2332 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-0.6744B+0.2332B^2    1.4457+-1.4824i      0.4829       0.1270
##   
## 
## [1]  0.6743537 -0.2332345
## [1] 6900.42
## [1] 8.844924
## [1] 8.858832

5.7.2 Residual diagnositcs

Ljung box test suggests that residuals are white noise.So assumptions on residuals are met.

## Obs 0.00206265 -0.01316139 0.01587749 0.003910852 -0.02350566 -0.05181897 0.009743788 -0.03212132 0.0180016 0.009887968 -0.03121394 0.005250239 0.04500742 -0.04326562 0.02648095 0.01425458 0.002183925 -0.0219641 0.01378266 -0.01134597 0.05572137 -0.03172055 -0.01145437 0.03961221
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 24
## 
## $chi.square
## [1] 19.25168
## 
## $df
## [1] 22
## 
## $pval
## [1] 0.6297398
## Obs 0.00206265 -0.01316139 0.01587749 0.003910852 -0.02350566 -0.05181897 0.009743788 -0.03212132 0.0180016 0.009887968 -0.03121394 0.005250239 0.04500742 -0.04326562 0.02648095 0.01425458 0.002183925 -0.0219641 0.01378266 -0.01134597 0.05572137 -0.03172055 -0.01145437 0.03961221 0.05927395 0.009596647 -0.02991589 0.03361678 0.06780902 -0.02902656 -0.04745418 0.05185142 -0.0542475 -0.003233708 -0.02039587 -0.01524462 -0.009991874 0.05167393 0.0007471355 -0.02078971 0.03198034 -0.006700283 0.0002952841 0.001895108 0.04857626 -0.03239038 -0.03342724 0.0004926967
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 48
## 
## $chi.square
## [1] 50.81318
## 
## $df
## [1] 46
## 
## $pval
## [1] 0.2896885

5.7.3 ARIMA Forecast

Final ARIMA(2,367,0) PM2.5 Model: As per AIC

(1 - 0.67B + 0.23\(B^2\)) (1- \(B^{367}\)) (\(X_t\) - 86.54) = \(a_t\)

Method ASE Window ASE \(\sigma^2\) AIC
AIC 6971.01 10640.08 6900.42 8.84
## [1] 10640.08
## [1] 6971.011

5.8 STL Decompostion

5.8.2 Forecast

##  Call:
##  stl(x = ., s.window = "periodic", t.window = 13, robust = TRUE)
## 
##  Time.series components:
##     seasonal             trend             remainder        
##  Min.   :-79.81691   Min.   : -8.52031   Min.   :-275.6153  
##  1st Qu.:-28.96175   1st Qu.: 59.00506   1st Qu.: -16.9382  
##  Median : -9.34930   Median : 76.28166   Median :   0.5996  
##  Mean   : -0.51636   Mean   : 77.22246   Mean   :   9.8411  
##  3rd Qu.: 13.15090   3rd Qu.: 93.35446   3rd Qu.:  22.5770  
##  Max.   :282.15023   Max.   :215.74722   Max.   : 476.4804  
##  IQR:
##      STL.seasonal STL.trend STL.remainder data 
##      42.11        34.35     39.52         81.17
##    %  51.9         42.3      48.7         100.0
## 
##  Weights:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.8041  0.9452  0.8053  0.9875  1.0000 
## 
##  Other components: List of 5
##  $ win  : Named num [1:3] 14411 13 365
##  $ deg  : Named int [1:3] 0 1 1
##  $ jump : Named num [1:3] 1442 2 37
##  $ inner: int 1
##  $ outer: int 15
## 
## Forecast method: STL +  Random walk
## 
## Model Information:
## Call: rwf(y = x, h = h, drift = FALSE, level = level) 
## 
## Residual sd: 72.5797 
## 
## Error measures:
##                        ME    RMSE      MAE       MPE     MAPE    MASE
## Training set -0.001751472 72.5545 47.36104 -48.22309 119.3651 0.65097
##                    ACF1
## Training set -0.1872593
## 
## Forecasts:
##          Point Forecast       Lo 80    Hi 80      Lo 95    Hi 95
## 4.947945     106.142765   13.160429 199.1251  -36.06145 248.3470
## 4.950685       4.587500 -126.909381 136.0844 -196.51962 205.6946
## 4.953425      -5.846121 -166.896252 155.2040 -252.15104 240.4588
## 4.956164      72.643006 -113.321667 258.6077 -211.76542 357.0514
## 4.958904     211.714023    3.799199 419.6288 -106.26426 529.6923
## 4.961644     304.579923   76.820644 532.3392  -43.74784 652.9077
## 4.964384      48.052450 -197.955688 294.0606 -328.18453 424.2894
## 4.967123     -11.308621 -274.302383 251.6851 -413.52287 390.9056
## 4.969863      -6.072099 -285.019107 272.8749 -432.68473 420.5405
## 4.972603     -11.530115 -305.566080 282.5058 -461.21932 438.1591
## 4.975342      78.591514 -229.796007 386.9790 -393.04650 550.2295
## 4.978082     116.861087 -205.239174 438.9613 -375.74875 609.4709
## 4.980822      96.755437 -238.497143 432.0080 -415.96914 609.4800
## 4.983562      33.244152 -314.663893 381.1522 -498.83529 565.3236
## 4.986301     138.491927 -221.627112 498.6110 -412.26262 689.2465
## 4.989041      25.291170 -346.638174 397.2205 -543.52568 594.1080
## 4.991781      -3.334701 -386.710694 380.0413 -589.65769 582.9883
## 4.994521     -25.964137 -420.454780 368.5265 -629.28551 577.3572
## 4.997260      -2.664316 -407.964923 402.6363 -622.51810 617.1895
## 5.000000       3.894547 -411.935102 419.7242 -632.06202 639.8511

5.9 VAR Model

5.9.1 Build a Model

s=365 which is 12 months or annual for dataset with daily observations.

ASE - 7.8

Window ASE - 243.682

## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      2      2      2      2 
## 
## $criteria
##                  1           2           3           4           5           6
## AIC(n)    8.923034    8.886223    8.888011    8.889527    8.890296    8.891640
## HQ(n)     8.926543    8.891486    8.895029    8.898299    8.900823    8.903922
## SC(n)     8.932300    8.900121    8.906542    8.912690    8.918092    8.924068
## FPE(n) 7502.821527 7231.652947 7244.598101 7255.583721 7261.167943 7270.935661
##                  7           8           9          10          11          12
## AIC(n)    8.890998    8.892857    8.894391    8.894176    8.895989    8.895835
## HQ(n)     8.905034    8.908648    8.911936    8.913476    8.917044    8.918644
## SC(n)     8.928059    8.934551    8.940717    8.945135    8.951581    8.956059
## FPE(n) 7266.270294 7279.790086 7290.968817 7289.399782 7302.633220 7301.508788
##                 13          14          15          16          17          18
## AIC(n)    8.896537    8.897703    8.896333    8.897489    8.899132    8.900951
## HQ(n)     8.921101    8.924021    8.924406    8.927316    8.930713    8.934287
## SC(n)     8.961394    8.967193    8.970455    8.976243    8.982519    8.988971
## FPE(n) 7306.640639 7315.166735 7305.154733 7313.602659 7325.633236 7338.975985
##                 19          20
## AIC(n)    8.902811    8.903420
## HQ(n)     8.937902    8.940265
## SC(n)     8.995464    9.000705
## FPE(n) 7352.645625 7357.128467
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: pm25, so2, no2, co 
## Deterministic variables: const 
## Sample size: 1093 
## Log Likelihood: -8727.368 
## Roots of the characteristic polynomial:
## 0.8748 0.5171 0.4971 0.4971 0.322 0.1459 0.1459 0.01251
## Call:
## VAR(y = X[2:1096, ], p = 2, type = "const")
## 
## 
## Estimation results for equation pm25: 
## ===================================== 
## pm25 = pm25.l1 + so2.l1 + no2.l1 + co.l1 + pm25.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##         Estimate Std. Error t value Pr(>|t|)    
## pm25.l1  0.60100    0.03079  19.520  < 2e-16 ***
## so2.l1   7.16300    3.95296   1.812   0.0703 .  
## no2.l1  -3.94513    6.61785  -0.596   0.5512    
## co.l1   -3.64588    6.72920  -0.542   0.5881    
## pm25.l2 -0.18898    0.04378  -4.317 1.73e-05 ***
## so2.l2  -1.17589    3.96590  -0.297   0.7669    
## no2.l2   0.90441    6.58957   0.137   0.8909    
## co.l2   -1.50716    6.23070  -0.242   0.8089    
## const    0.46170    2.83800   0.163   0.8708    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 84.98 on 1084 degrees of freedom
## Multiple R-Squared: 0.2926,  Adjusted R-squared: 0.2873 
## F-statistic: 56.03 on 8 and 1084 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation so2: 
## ==================================== 
## so2 = pm25.l1 + so2.l1 + no2.l1 + co.l1 + pm25.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## pm25.l1  0.0068701  0.0002814  24.412  < 2e-16 ***
## so2.l1   0.4825323  0.0361318  13.355  < 2e-16 ***
## no2.l1   0.2541764  0.0604901   4.202 2.86e-05 ***
## co.l1   -0.3208801  0.0615078  -5.217 2.18e-07 ***
## pm25.l2 -0.0037479  0.0004001  -9.366  < 2e-16 ***
## so2.l2   0.0060907  0.0362501   0.168 0.866600    
## no2.l2  -0.2245443  0.0602315  -3.728 0.000203 ***
## co.l2    0.0827338  0.0569514   1.453 0.146594    
## const   -0.1784372  0.0259406  -6.879 1.02e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.7768 on 1084 degrees of freedom
## Multiple R-Squared: 0.5265,  Adjusted R-squared: 0.523 
## F-statistic: 150.7 on 8 and 1084 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation no2: 
## ==================================== 
## no2 = pm25.l1 + so2.l1 + no2.l1 + co.l1 + pm25.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## pm25.l1  0.0045265  0.0001668  27.136  < 2e-16 ***
## so2.l1  -0.0819185  0.0214166  -3.825 0.000138 ***
## no2.l1   0.8000967  0.0358545  22.315  < 2e-16 ***
## co.l1   -0.2416602  0.0364578  -6.628 5.34e-11 ***
## pm25.l2 -0.0025530  0.0002372 -10.764  < 2e-16 ***
## so2.l2  -0.0178499  0.0214866  -0.831 0.406302    
## no2.l2   0.0635419  0.0357013   1.780 0.075385 .  
## co.l2    0.0077309  0.0337570   0.229 0.818900    
## const   -0.0358008  0.0153759  -2.328 0.020075 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.4604 on 1084 degrees of freedom
## Multiple R-Squared: 0.7258,  Adjusted R-squared: 0.7238 
## F-statistic: 358.7 on 8 and 1084 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation co: 
## =================================== 
## co = pm25.l1 + so2.l1 + no2.l1 + co.l1 + pm25.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## pm25.l1  0.0065955  0.0001755  37.579  < 2e-16 ***
## so2.l1  -0.0265487  0.0225342  -1.178 0.238993    
## no2.l1   0.0912619  0.0377256   2.419 0.015723 *  
## co.l1    0.3763464  0.0383603   9.811  < 2e-16 ***
## pm25.l2 -0.0031176  0.0002496 -12.493  < 2e-16 ***
## so2.l2  -0.0132235  0.0226079  -0.585 0.558731    
## no2.l2  -0.0822854  0.0375643  -2.191 0.028699 *  
## co.l2    0.0009036  0.0355186   0.025 0.979708    
## const   -0.0564721  0.0161782  -3.491 0.000501 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.4845 on 1084 degrees of freedom
## Multiple R-Squared: 0.6748,  Adjusted R-squared: 0.6724 
## F-statistic: 281.2 on 8 and 1084 DF,  p-value: < 2.2e-16 
## 
## 
## 
## Covariance matrix of residuals:
##          pm25    so2    no2     co
## pm25 7222.133 9.6062 5.9590 4.7220
## so2     9.606 0.6034 0.1397 0.2005
## no2     5.959 0.1397 0.2120 0.1197
## co      4.722 0.2005 0.1197 0.2347
## 
## Correlation matrix of residuals:
##        pm25    so2    no2     co
## pm25 1.0000 0.1455 0.1523 0.1147
## so2  0.1455 1.0000 0.3907 0.5329
## no2  0.1523 0.3907 1.0000 0.5367
## co   0.1147 0.5329 0.5367 1.0000
## 
## Call:
## lm(formula = y ~ -1 + ., data = datamat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -430.82  -43.71   -1.24   45.54  537.74 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## pm25.l1  0.60100    0.03079  19.520  < 2e-16 ***
## so2.l1   7.16300    3.95296   1.812   0.0703 .  
## no2.l1  -3.94513    6.61785  -0.596   0.5512    
## co.l1   -3.64588    6.72920  -0.542   0.5881    
## pm25.l2 -0.18898    0.04378  -4.317 1.73e-05 ***
## so2.l2  -1.17589    3.96590  -0.297   0.7669    
## no2.l2   0.90441    6.58957   0.137   0.8909    
## co.l2   -1.50716    6.23070  -0.242   0.8089    
## const    0.46170    2.83800   0.163   0.8708    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 84.98 on 1084 degrees of freedom
## Multiple R-squared:  0.2926, Adjusted R-squared:  0.2873 
## F-statistic: 56.03 on 8 and 1084 DF,  p-value: < 2.2e-16

5.9.2 Residual diagnositcs

## 
## VAR Estimation Results:
## ======================= 
## 
## Estimated coefficients for equation pm25: 
## ========================================= 
## Call:
## pm25 = pm25.l1 + so2.l1 + no2.l1 + co.l1 + pm25.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##    pm25.l1     so2.l1     no2.l1      co.l1    pm25.l2     so2.l2     no2.l2 
##  0.6009966  7.1629968 -3.9451272 -3.6458849 -0.1889785 -1.1758934  0.9044107 
##      co.l2      const 
## -1.5071645  0.4617003 
## 
## 
## Estimated coefficients for equation so2: 
## ======================================== 
## Call:
## so2 = pm25.l1 + so2.l1 + no2.l1 + co.l1 + pm25.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##      pm25.l1       so2.l1       no2.l1        co.l1      pm25.l2       so2.l2 
##  0.006870089  0.482532295  0.254176396 -0.320880117 -0.003747859  0.006090706 
##       no2.l2        co.l2        const 
## -0.224544329  0.082733788 -0.178437175 
## 
## 
## Estimated coefficients for equation no2: 
## ======================================== 
## Call:
## no2 = pm25.l1 + so2.l1 + no2.l1 + co.l1 + pm25.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##      pm25.l1       so2.l1       no2.l1        co.l1      pm25.l2       so2.l2 
##  0.004526537 -0.081918482  0.800096692 -0.241660244 -0.002553024 -0.017849861 
##       no2.l2        co.l2        const 
##  0.063541900  0.007730899 -0.035800769 
## 
## 
## Estimated coefficients for equation co: 
## ======================================= 
## Call:
## co = pm25.l1 + so2.l1 + no2.l1 + co.l1 + pm25.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##       pm25.l1        so2.l1        no2.l1         co.l1       pm25.l2 
##  0.0065954713 -0.0265487340  0.0912619438  0.3763463606 -0.0031175859 
##        so2.l2        no2.l2         co.l2         const 
## -0.0132235428 -0.0822853512  0.0009036387 -0.0564721133
## $pm25
##             fcst     lower    upper       CI
##  [1,] -17.588924 -184.1528 148.9750 166.5639
##  [2,] -10.674280 -205.5597 184.2111 194.8854
##  [3,]  -5.087442 -202.5318 192.3569 197.4444
##  [4,]  -2.435193 -200.0108 195.1404 197.5756
##  [5,]  -2.020706 -199.8784 195.8370 197.8577
##  [6,]  -2.387554 -200.3523 195.5772 197.9648
##  [7,]  -2.719783 -200.7004 195.2608 197.9806
##  [8,]  -2.827869 -200.8138 195.1581 197.9859
##  [9,]  -2.787196 -200.7779 195.2035 197.9907
## [10,]  -2.699788 -200.6939 195.2943 197.9941
## [11,]  -2.618860 -200.6154 195.3776 197.9965
## [12,]  -2.556531 -200.5549 195.4418 197.9983
## [13,]  -2.508228 -200.5080 195.4915 197.9997
## [14,]  -2.467785 -200.4686 195.5330 198.0008
## [15,]  -2.431924 -200.4335 195.5697 198.0016
## [16,]  -2.399677 -200.4019 195.6026 198.0022
## [17,]  -2.370938 -200.3737 195.6318 198.0027
## [18,]  -2.345624 -200.3487 195.6575 198.0031
## [19,]  -2.323478 -200.3268 195.6799 198.0034
## [20,]  -2.304139 -200.3077 195.6994 198.0036
## [21,]  -2.287243 -200.2910 195.7165 198.0037
## [22,]  -2.272465 -200.2763 195.7314 198.0039
## [23,]  -2.259534 -200.2635 195.7444 198.0040
## [24,]  -2.248218 -200.2523 195.7558 198.0040
## [25,]  -2.238317 -200.2424 195.7658 198.0041
## [26,]  -2.229654 -200.2338 195.7745 198.0041
## [27,]  -2.222076 -200.2262 195.7821 198.0042
## [28,]  -2.215448 -200.2196 195.7887 198.0042
## [29,]  -2.209649 -200.2139 195.7946 198.0042
## [30,]  -2.204576 -200.2088 195.7996 198.0042
## 
## $so2
##              fcst     lower    upper       CI
##  [1,] -0.07540616 -1.597874 1.447061 1.522467
##  [2,] -0.35016317 -2.441867 1.741541 2.091704
##  [3,] -0.30499131 -2.483754 1.873771 2.178762
##  [4,] -0.28393199 -2.475655 1.907791 2.191723
##  [5,] -0.28973443 -2.496095 1.916626 2.206361
##  [6,] -0.30432824 -2.515769 1.907112 2.211440
##  [7,] -0.31558777 -2.527615 1.896440 2.212028
##  [8,] -0.32080883 -2.532857 1.891239 2.212048
##  [9,] -0.32207432 -2.534155 1.890006 2.212080
## [10,] -0.32182154 -2.533916 1.890273 2.212095
## [11,] -0.32136477 -2.533464 1.890735 2.212099
## [12,] -0.32107999 -2.533182 1.891022 2.212102
## [13,] -0.32093747 -2.533042 1.891167 2.212105
## [14,] -0.32084201 -2.532948 1.891264 2.212106
## [15,] -0.32074427 -2.532851 1.891363 2.212107
## [16,] -0.32063773 -2.532746 1.891470 2.212108
## [17,] -0.32053155 -2.532640 1.891577 2.212109
## [18,] -0.32043388 -2.532543 1.891675 2.212109
## [19,] -0.32034782 -2.532457 1.891762 2.212110
## [20,] -0.32027300 -2.532383 1.891837 2.212110
## [21,] -0.32020787 -2.532318 1.891902 2.212110
## [22,] -0.32015093 -2.532261 1.891959 2.212110
## [23,] -0.32010101 -2.532211 1.892009 2.212110
## [24,] -0.32005725 -2.532168 1.892053 2.212110
## [25,] -0.32001892 -2.532129 1.892092 2.212110
## [26,] -0.31998536 -2.532096 1.892125 2.212111
## [27,] -0.31995601 -2.532067 1.892155 2.212111
## [28,] -0.31993033 -2.532041 1.892180 2.212111
## [29,] -0.31990787 -2.532019 1.892203 2.212111
## [30,] -0.31988822 -2.531999 1.892222 2.212111
## 
## $no2
##             fcst      lower    upper        CI
##  [1,] 0.37901644 -0.5234018 1.281435 0.9024182
##  [2,] 0.28915445 -1.0722406 1.650549 1.3613950
##  [3,] 0.29412286 -1.1841947 1.772440 1.4783176
##  [4,] 0.28598933 -1.2516224 1.823601 1.5376117
##  [5,] 0.26408712 -1.3203241 1.848498 1.5844112
##  [6,] 0.23786404 -1.3808420 1.856570 1.6187061
##  [7,] 0.21445888 -1.4294318 1.858350 1.6438907
##  [8,] 0.19569806 -1.4671844 1.858581 1.6628825
##  [9,] 0.18067015 -1.4966103 1.857951 1.6772805
## [10,] 0.16809027 -1.5201085 1.856289 1.6881988
## [11,] 0.15715019 -1.5393467 1.853647 1.6964969
## [12,] 0.14749281 -1.5553256 1.850311 1.7028184
## [13,] 0.13897305 -1.5686680 1.846614 1.7076410
## [14,] 0.13149452 -1.5798288 1.842818 1.7113233
## [15,] 0.12495353 -1.5891827 1.839090 1.7141363
## [16,] 0.11923911 -1.5970469 1.835525 1.7162860
## [17,] 0.11424530 -1.6036840 1.832175 1.7179293
## [18,] 0.10987850 -1.6093073 1.829064 1.7191858
## [19,] 0.10605843 -1.6140883 1.826205 1.7201467
## [20,] 0.10271621 -1.6181655 1.823598 1.7208817
## [21,] 0.09979217 -1.6216519 1.821236 1.7214440
## [22,] 0.09723414 -1.6246401 1.819108 1.7218742
## [23,] 0.09499638 -1.6272070 1.817200 1.7222034
## [24,] 0.09303882 -1.6294164 1.815494 1.7224552
## [25,] 0.09132636 -1.6313216 1.813974 1.7226479
## [26,] 0.08982829 -1.6329671 1.812624 1.7227954
## [27,] 0.08851777 -1.6343905 1.811426 1.7229083
## [28,] 0.08737133 -1.6356233 1.810366 1.7229946
## [29,] 0.08636841 -1.6366923 1.809429 1.7230607
## [30,] 0.08549106 -1.6376202 1.808602 1.7231113
## 
## $co
##              fcst      lower     upper        CI
##  [1,] -0.02305443 -0.9725642 0.9264553 0.9495098
##  [2,] -0.19882738 -1.7359708 1.3383160 1.5371434
##  [3,] -0.14139301 -1.7958963 1.5131103 1.6545033
##  [4,] -0.09436404 -1.7496767 1.5609486 1.6553126
##  [5,] -0.07884511 -1.7392083 1.5815181 1.6603632
##  [6,] -0.07995100 -1.7435290 1.5836271 1.6635781
##  [7,] -0.08419167 -1.7482756 1.5798922 1.6640839
##  [8,] -0.08632248 -1.7504395 1.5777946 1.6641171
##  [9,] -0.08630409 -1.7504669 1.5778587 1.6641628
## [10,] -0.08541898 -1.7496108 1.5787729 1.6641919
## [11,] -0.08453761 -1.7487438 1.5796686 1.6642062
## [12,] -0.08392261 -1.7481386 1.5802934 1.6642160
## [13,] -0.08352631 -1.7477500 1.5806974 1.6642237
## [14,] -0.08324276 -1.7474722 1.5809867 1.6642295
## [15,] -0.08300541 -1.7472393 1.5812285 1.6642339
## [16,] -0.08279082 -1.7470281 1.5814464 1.6642372
## [17,] -0.08259637 -1.7468362 1.5816434 1.6642398
## [18,] -0.08242373 -1.7466655 1.5818180 1.6642418
## [19,] -0.08227283 -1.7465161 1.5819705 1.6642433
## [20,] -0.08214161 -1.7463861 1.5821028 1.6642444
## [21,] -0.08202740 -1.7462727 1.5822179 1.6642453
## [22,] -0.08192770 -1.7461737 1.5823183 1.6642460
## [23,] -0.08184050 -1.7460870 1.5824060 1.6642465
## [24,] -0.08176419 -1.7460111 1.5824827 1.6642469
## [25,] -0.08169741 -1.7459446 1.5825498 1.6642472
## [26,] -0.08163899 -1.7458864 1.5826085 1.6642475
## [27,] -0.08158788 -1.7458355 1.5826598 1.6642476
## [28,] -0.08154317 -1.7457909 1.5827046 1.6642478
## [29,] -0.08150406 -1.7457519 1.5827438 1.6642479
## [30,] -0.08146985 -1.7457178 1.5827781 1.6642480

6 PM10 Analysis

6.1 Stationarity

First plot the data.

  • No visible trend and seasonality in realization
  • ACF cuts off after 2nd lag. No indication of non-stationarity
  • Spectral density plot shows peak at 0, indicating wandering behavior
  • First and second half of ACFs shows no significant difference.
  • This is stationary series.

6.3 Dicky-Fuller Test

\(H_{0}\) - Unit root present
\(H_{1}\) - Unit root not present. or time series is stationary.

Conclusion

pvalue = 0.01. Reject Ho. Suggests strong evidence against presence of unit root.
No visual evidence of sesonality so time series is stationary.

## Warning in adf.test(tt_pm10_ts$train_ts): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  tt_pm10_ts$train_ts
## Dickey-Fuller = -8.3554, Lag order = 11, p-value = 0.01
## alternative hypothesis: stationary

6.4 Model ID

## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic

AR(6,1) turns out to be the best mode as per AIC function.

6.5 Estimate Parameters

6.5.1 Estimation

## 
## Coefficients of Original polynomial:  
## 1.5367 -0.7205 0.1915 -0.0362 -0.0335 0.0448 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-0.9732B              1.0275               0.9732       0.0000
## 1-1.0842B+0.4133B^2    1.3116+-0.8362i      0.6429       0.0903
## 1+0.1172B+0.2761B^2   -0.2123+-1.8911i      0.5255       0.2678
## 1+0.4035B             -2.4781               0.4035       0.5000
##   
## 
## [1]  1.53668859 -0.72052819  0.19154879 -0.03623074 -0.03352981  0.04482387
## [1] 0.934823
## [1] 110.7666

6.5.2 Residual diagnositcs

## Obs -0.0008863582 -0.002190483 0.0003356206 -0.0007377869 0.004632811 0.006264072 -0.03551586 -0.006313547 0.008870318 0.04421087 -0.01537624 -0.01848627 0.01012401 0.01853992 -0.001019537 0.0108432 -0.02166164 -0.05481056 -0.01098366 -0.0226941 0.004681241 0.02812718 -0.007642125 0.04244317
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 24
## 
## $chi.square
## [1] 16.5282
## 
## $df
## [1] 17
## 
## $pval
## [1] 0.4867469
## Obs -0.0008863582 -0.002190483 0.0003356206 -0.0007377869 0.004632811 0.006264072 -0.03551586 -0.006313547 0.008870318 0.04421087 -0.01537624 -0.01848627 0.01012401 0.01853992 -0.001019537 0.0108432 -0.02166164 -0.05481056 -0.01098366 -0.0226941 0.004681241 0.02812718 -0.007642125 0.04244317 0.0167526 -0.004519385 -0.002749417 0.05848487 0.03145483 0.01406576 -0.01134344 0.02369013 -0.008841578 -0.03103694 -0.01795642 0.02331942 -0.03734233 0.0400099 0.03467535 -0.02300888 0.0487511 -0.02985911 -0.00898804 -0.03063819 0.04447112 -0.004417444 -0.00231838 -0.008690027
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 48
## 
## $chi.square
## [1] 44.07721
## 
## $df
## [1] 41
## 
## $pval
## [1] 0.3427642

6.6 Final Model

Final PM10 Model:

(1 - 1.53B + 0.72\(B^2\) - 0.18\(B^3\) + 0.04\(B^4\) + 0.03\(B^5\) - 0.05\(B^5\) ) (\(X_t\) - 110.111) = (1 - 0.93B) (\(a_t\))

6.7 PM10 Forecasting

6.9 VAR Model

6.9.1 Build a Model

s=365 which is 12 months or annual for dataset with daily observations.

ASE - 7.8

Window ASE - 243.682

## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      2      2      2      2 
## 
## $criteria
##                  1           2           3           4           5           6
## AIC(n)    9.079985    9.058256    9.059817    9.061190    9.062818    9.064636
## HQ(n)     9.083494    9.063520    9.066835    9.069963    9.073345    9.076917
## SC(n)     9.089250    9.072154    9.078347    9.084354    9.090614    9.097064
## FPE(n) 8777.833573 8589.158332 8602.576288 8614.399600 8628.433481 8644.131333
##                  7           8           9          10          11          12
## AIC(n)    9.064204    9.065835    9.067367    9.066914    9.068012    9.069628
## HQ(n)     9.078240    9.081625    9.084912    9.086214    9.089066    9.092437
## SC(n)     9.101265    9.107528    9.113693    9.117873    9.123603    9.129853
## FPE(n) 8640.399726 8654.502003 8667.772137 8663.850815 8673.367941 8687.405296
##                 13          14          15          16          17          18
## AIC(n)    9.070992    9.072346    9.073854    9.074456    9.076291    9.077695
## HQ(n)     9.095555    9.098664    9.101927    9.104283    9.107872    9.111031
## SC(n)     9.135848    9.141835    9.147976    9.153211    9.159678    9.165715
## FPE(n) 8699.258473 8711.049117 8724.204033 8729.457924 8745.492101 8757.784866
##                 19          20
## AIC(n)    9.079035    9.080093
## HQ(n)     9.114126    9.116938
## SC(n)     9.171688    9.177378
## FPE(n) 8769.541413 8778.822332
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: pm10, so2, no2, co 
## Deterministic variables: const 
## Sample size: 1093 
## Log Likelihood: -8899.737 
## Roots of the characteristic polynomial:
## 0.8755 0.524 0.4413 0.4413 0.4216 0.1343 0.1343 0.02679
## Call:
## VAR(y = X10[2:1096, ], p = 2, type = "const")
## 
## 
## Estimation results for equation pm10: 
## ===================================== 
## pm10 = pm10.l1 + so2.l1 + no2.l1 + co.l1 + pm10.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##          Estimate Std. Error t value Pr(>|t|)    
## pm10.l1   0.56499    0.03086  18.307  < 2e-16 ***
## so2.l1    7.57059    4.31061   1.756  0.07933 .  
## no2.l1   -6.12900    7.21680  -0.849  0.39592    
## co.l1   -11.48029    6.99430  -1.641  0.10101    
## pm10.l2  -0.10999    0.04040  -2.722  0.00658 ** 
## so2.l2   -0.65644    4.30394  -0.153  0.87880    
## no2.l2    4.71072    7.15757   0.658  0.51059    
## co.l2    -2.82618    6.72999  -0.420  0.67461    
## const     1.59865    3.09312   0.517  0.60537    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 92.33 on 1084 degrees of freedom
## Multiple R-Squared: 0.2656,  Adjusted R-squared: 0.2602 
## F-statistic:    49 on 8 and 1084 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation so2: 
## ==================================== 
## so2 = pm10.l1 + so2.l1 + no2.l1 + co.l1 + pm10.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## pm10.l1  0.0062081  0.0002626  23.638  < 2e-16 ***
## so2.l1   0.4949450  0.0366824  13.493  < 2e-16 ***
## no2.l1   0.2138302  0.0614133   3.482 0.000518 ***
## co.l1   -0.2582311  0.0595199  -4.339 1.57e-05 ***
## pm10.l2 -0.0034857  0.0003438 -10.139  < 2e-16 ***
## so2.l2   0.0017399  0.0366256   0.048 0.962118    
## no2.l2  -0.1966254  0.0609094  -3.228 0.001283 ** 
## co.l2    0.0565979  0.0572708   0.988 0.323251    
## const   -0.1807393  0.0263217  -6.867 1.11e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.7857 on 1084 degrees of freedom
## Multiple R-Squared: 0.5156,  Adjusted R-squared: 0.512 
## F-statistic: 144.2 on 8 and 1084 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation no2: 
## ==================================== 
## no2 = pm10.l1 + so2.l1 + no2.l1 + co.l1 + pm10.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## pm10.l1  0.0040068  0.0001578  25.395  < 2e-16 ***
## so2.l1  -0.0710187  0.0220378  -3.223  0.00131 ** 
## no2.l1   0.7783590  0.0368956  21.096  < 2e-16 ***
## co.l1   -0.1963243  0.0357580  -5.490 4.99e-08 ***
## pm10.l2 -0.0024410  0.0002065 -11.818  < 2e-16 ***
## so2.l2  -0.0191667  0.0220037  -0.871  0.38391    
## no2.l2   0.0789267  0.0365928   2.157  0.03123 *  
## co.l2   -0.0094685  0.0344068  -0.275  0.78322    
## const   -0.0353871  0.0158134  -2.238  0.02544 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.472 on 1084 degrees of freedom
## Multiple R-Squared: 0.7118,  Adjusted R-squared: 0.7097 
## F-statistic: 334.7 on 8 and 1084 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation co: 
## =================================== 
## co = pm10.l1 + so2.l1 + no2.l1 + co.l1 + pm10.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## pm10.l1  0.0056389  0.0001758  32.073  < 2e-16 ***
## so2.l1  -0.0103322  0.0245567  -0.421  0.67402    
## no2.l1   0.0671079  0.0411127   1.632  0.10291    
## co.l1    0.4562014  0.0398451  11.449  < 2e-16 ***
## pm10.l2 -0.0030460  0.0002302 -13.235  < 2e-16 ***
## so2.l2  -0.0128658  0.0245187  -0.525  0.59987    
## no2.l2  -0.0694859  0.0407753  -1.704  0.08865 .  
## co.l2   -0.0178999  0.0383395  -0.467  0.64068    
## const   -0.0547419  0.0176209  -3.107  0.00194 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.526 on 1084 degrees of freedom
## Multiple R-Squared: 0.6167,  Adjusted R-squared: 0.6139 
## F-statistic:   218 on 8 and 1084 DF,  p-value: < 2.2e-16 
## 
## 
## 
## Covariance matrix of residuals:
##          pm10    so2    no2     co
## pm10 8524.639 9.2868 6.9361 3.9347
## so2     9.287 0.6173 0.1527 0.2291
## no2     6.936 0.1527 0.2228 0.1420
## co      3.935 0.2291 0.1420 0.2767
## 
## Correlation matrix of residuals:
##         pm10    so2    no2      co
## pm10 1.00000 0.1280 0.1592 0.08102
## so2  0.12802 1.0000 0.4117 0.55446
## no2  0.15915 0.4117 1.0000 0.57184
## co   0.08102 0.5545 0.5718 1.00000

6.9.2 Residual diagnositcs

## 
## VAR Estimation Results:
## ======================= 
## 
## Estimated coefficients for equation pm10: 
## ========================================= 
## Call:
## pm10 = pm10.l1 + so2.l1 + no2.l1 + co.l1 + pm10.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##     pm10.l1      so2.l1      no2.l1       co.l1     pm10.l2      so2.l2 
##   0.5649867   7.5705851  -6.1290000 -11.4802918  -0.1099856  -0.6564396 
##      no2.l2       co.l2       const 
##   4.7107155  -2.8261769   1.5986505 
## 
## 
## Estimated coefficients for equation so2: 
## ======================================== 
## Call:
## so2 = pm10.l1 + so2.l1 + no2.l1 + co.l1 + pm10.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##      pm10.l1       so2.l1       no2.l1        co.l1      pm10.l2       so2.l2 
##  0.006208144  0.494944997  0.213830182 -0.258231074 -0.003485726  0.001739946 
##       no2.l2        co.l2        const 
## -0.196625409  0.056597862 -0.180739315 
## 
## 
## Estimated coefficients for equation no2: 
## ======================================== 
## Call:
## no2 = pm10.l1 + so2.l1 + no2.l1 + co.l1 + pm10.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##      pm10.l1       so2.l1       no2.l1        co.l1      pm10.l2       so2.l2 
##  0.004006803 -0.071018687  0.778358996 -0.196324294 -0.002440981 -0.019166749 
##       no2.l2        co.l2        const 
##  0.078926700 -0.009468485 -0.035387057 
## 
## 
## Estimated coefficients for equation co: 
## ======================================= 
## Call:
## co = pm10.l1 + so2.l1 + no2.l1 + co.l1 + pm10.l2 + so2.l2 + no2.l2 + co.l2 + const 
## 
##      pm10.l1       so2.l1       no2.l1        co.l1      pm10.l2       so2.l2 
##  0.005638880 -0.010332235  0.067107904  0.456201352 -0.003046009 -0.012865848 
##       no2.l2        co.l2        const 
## -0.069485927 -0.017899889 -0.054741856
## $pm10
##              fcst     lower    upper       CI
##  [1,] -13.5041490 -194.4656 167.4573 180.9615
##  [2,]  -6.7159086 -214.8079 201.3761 208.0920
##  [3,]  -1.3132034 -212.0697 209.4433 210.7565
##  [4,]   1.0505905 -209.9582 212.0594 211.0088
##  [5,]   1.4071181 -209.7122 212.5264 211.1193
##  [6,]   1.1251378 -210.0220 212.2723 211.1471
##  [7,]   0.8643657 -210.2867 212.0154 211.1511
##  [8,]   0.7489306 -210.4033 211.9012 211.1523
##  [9,]   0.7230406 -210.4298 211.8758 211.1528
## [10,]   0.7271724 -210.4258 211.8802 211.1530
## [11,]   0.7348587 -210.4182 211.8880 211.1531
## [12,]   0.7405423 -210.4126 211.8937 211.1532
## [13,]   0.7451434 -210.4081 211.8983 211.1532
## [14,]   0.7497379 -210.4035 211.9030 211.1532
## [15,]   0.7544900 -210.3988 211.9077 211.1532
## [16,]   0.7591461 -210.3941 211.9124 211.1533
## [17,]   0.7634681 -210.3898 211.9167 211.1533
## [18,]   0.7673514 -210.3859 211.9206 211.1533
## [19,]   0.7707896 -210.3825 211.9241 211.1533
## [20,]   0.7738177 -210.3795 211.9271 211.1533
## [21,]   0.7764797 -210.3768 211.9298 211.1533
## [22,]   0.7788175 -210.3745 211.9321 211.1533
## [23,]   0.7808684 -210.3724 211.9342 211.1533
## [24,]   0.7826663 -210.3706 211.9360 211.1533
## [25,]   0.7842416 -210.3691 211.9375 211.1533
## [26,]   0.7856213 -210.3677 211.9389 211.1533
## [27,]   0.7868296 -210.3665 211.9401 211.1533
## [28,]   0.7878876 -210.3654 211.9412 211.1533
## [29,]   0.7888140 -210.3645 211.9421 211.1533
## [30,]   0.7896252 -210.3637 211.9429 211.1533
## 
## $so2
##              fcst     lower    upper       CI
##  [1,] -0.05407453 -1.594017 1.485867 1.539942
##  [2,] -0.37114473 -2.469927 1.727637 2.098782
##  [3,] -0.32889045 -2.513849 1.856068 2.184958
##  [4,] -0.29775626 -2.496339 1.900827 2.198583
##  [5,] -0.29737269 -2.506438 1.911692 2.209065
##  [6,] -0.30775681 -2.519756 1.904242 2.211999
##  [7,] -0.31620420 -2.528527 1.896119 2.212323
##  [8,] -0.32010372 -2.532449 1.892241 2.212345
##  [9,] -0.32113302 -2.533489 1.891223 2.212356
## [10,] -0.32110351 -2.533465 1.891258 2.212362
## [11,] -0.32089555 -2.533260 1.891469 2.212364
## [12,] -0.32074028 -2.533106 1.891626 2.212366
## [13,] -0.32063275 -2.533000 1.891735 2.212368
## [14,] -0.32053896 -2.532907 1.891830 2.212368
## [15,] -0.32044603 -2.532815 1.891923 2.212369
## [16,] -0.32035550 -2.532725 1.892014 2.212370
## [17,] -0.32027158 -2.532642 1.892099 2.212370
## [18,] -0.32019647 -2.532567 1.892174 2.212370
## [19,] -0.32013030 -2.532501 1.892240 2.212371
## [20,] -0.32007225 -2.532443 1.892299 2.212371
## [21,] -0.32002137 -2.532392 1.892350 2.212371
## [22,] -0.31997674 -2.532348 1.892394 2.212371
## [23,] -0.31993762 -2.532309 1.892434 2.212371
## [24,] -0.31990334 -2.532275 1.892468 2.212371
## [25,] -0.31987331 -2.532245 1.892498 2.212371
## [26,] -0.31984701 -2.532218 1.892524 2.212371
## [27,] -0.31982398 -2.532195 1.892547 2.212371
## [28,] -0.31980382 -2.532175 1.892568 2.212371
## [29,] -0.31978617 -2.532158 1.892585 2.212371
## [30,] -0.31977071 -2.532142 1.892601 2.212371
## 
## $no2
##             fcst      lower    upper        CI
##  [1,] 0.39723773 -0.5279198 1.322395 0.9251575
##  [2,] 0.27915314 -1.0876899 1.645996 1.3668431
##  [3,] 0.28220112 -1.1964246 1.760827 1.4786257
##  [4,] 0.27700738 -1.2606740 1.814689 1.5376813
##  [5,] 0.25720357 -1.3261951 1.840602 1.5833986
##  [6,] 0.23298910 -1.3843086 1.850287 1.6172977
##  [7,] 0.21124345 -1.4314039 1.853891 1.6426473
##  [8,] 0.19349667 -1.4683392 1.855333 1.6618359
##  [9,] 0.17893618 -1.4974587 1.855331 1.6763949
## [10,] 0.16656038 -1.5209008 1.854022 1.6874612
## [11,] 0.15577574 -1.5401165 1.851668 1.6958922
## [12,] 0.14630069 -1.5560265 1.848628 1.7023272
## [13,] 0.13798032 -1.5692640 1.845225 1.7072443
## [14,] 0.13068952 -1.5803149 1.841694 1.7110044
## [15,] 0.12430838 -1.5895730 1.838190 1.7138814
## [16,] 0.11872440 -1.5973590 1.834808 1.7160834
## [17,] 0.11383694 -1.6039326 1.831607 1.7177696
## [18,] 0.10955823 -1.6095027 1.828619 1.7190610
## [19,] 0.10581207 -1.6142382 1.825862 1.7200503
## [20,] 0.10253213 -1.6182761 1.823340 1.7208082
## [21,] 0.09966040 -1.6217286 1.821049 1.7213890
## [22,] 0.09714611 -1.6246880 1.818980 1.7218341
## [23,] 0.09494477 -1.6272304 1.817120 1.7221752
## [24,] 0.09301743 -1.6294192 1.815454 1.7224366
## [25,] 0.09132998 -1.6313070 1.813967 1.7226370
## [26,] 0.08985255 -1.6329380 1.812643 1.7227906
## [27,] 0.08855902 -1.6343493 1.811467 1.7229083
## [28,] 0.08742649 -1.6355721 1.810425 1.7229986
## [29,] 0.08643491 -1.6366328 1.809503 1.7230677
## [30,] 0.08556676 -1.6375540 1.808688 1.7231207
## 
## $co
##              fcst     lower    upper       CI
##  [1,]  0.02860225 -1.002299 1.059504 1.030902
##  [2,] -0.18223053 -1.746634 1.382173 1.564404
##  [3,] -0.13946266 -1.797922 1.518996 1.658459
##  [4,] -0.09433731 -1.753286 1.564611 1.658949
##  [5,] -0.07906979 -1.741568 1.583428 1.662498
##  [6,] -0.07947482 -1.743759 1.584809 1.664284
##  [7,] -0.08275547 -1.747299 1.581788 1.664543
##  [8,] -0.08441226 -1.748981 1.580156 1.664569
##  [9,] -0.08449693 -1.749084 1.580090 1.664587
## [10,] -0.08398345 -1.748583 1.580616 1.664599
## [11,] -0.08345134 -1.748059 1.581157 1.664608
## [12,] -0.08305335 -1.747668 1.581561 1.664614
## [13,] -0.08276343 -1.747383 1.581856 1.664619
## [14,] -0.08253274 -1.747156 1.582090 1.664623
## [15,] -0.08233427 -1.746960 1.582292 1.664626
## [16,] -0.08215884 -1.746787 1.582469 1.664628
## [17,] -0.08200405 -1.746634 1.582626 1.664630
## [18,] -0.08186839 -1.746500 1.582763 1.664631
## [19,] -0.08174992 -1.746382 1.582882 1.664632
## [20,] -0.08164648 -1.746280 1.582987 1.664633
## [21,] -0.08155607 -1.746190 1.583078 1.664634
## [22,] -0.08147696 -1.746111 1.583157 1.664634
## [23,] -0.08140772 -1.746042 1.583227 1.664634
## [24,] -0.08134710 -1.745982 1.583288 1.664635
## [25,] -0.08129403 -1.745929 1.583341 1.664635
## [26,] -0.08124757 -1.745883 1.583388 1.664635
## [27,] -0.08120689 -1.745842 1.583428 1.664635
## [28,] -0.08117128 -1.745807 1.583464 1.664635
## [29,] -0.08114010 -1.745775 1.583495 1.664635
## [30,] -0.08111280 -1.745748 1.583523 1.664635

7 SO2 Analysis

7.1 Stationarity

First plot the data.

  • Visual evidence of trend in some part of realization.
  • Realization show periodic behavior but that is not reflected in spectral density.
  • Peak at 0 and 0.13. But that’s not a strong evidence.
  • ACFs are not damping but has very mild evidence of sinusoidal behavior
  • Mild visual evidence that first half and second half of ACFs are different.
  • Variance is not constant
  • Time series not a stationary time series.

7.3 Dicky-Fuller Test

\(H_{0}\) - Unit root present
\(H_{1}\) - Unit root not present. or time series is stationary.

Conclusion

pvalue = 0.01. Reject Ho. Suggests strong evidence against presence of unit root.
No visual evidence of sesonality so time series is stationary.

## Warning in adf.test(tt_so2_ts$train_ts): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  tt_so2_ts$train_ts
## Dickey-Fuller = -4.4819, Lag order = 11, p-value = 0.01
## alternative hypothesis: stationary

p-value < 0.05 (using Cochrane-Orcutt Test) provides strong evidence that trend exist.

7.4 Model ID

## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic

AR(9,1) turns out to be the best mode as per AIC function.

7.5 ARIMA

7.5.1 Estimate Parameters

  • Ljung Box test fails for K-48. So Resudial assumptions are not met.
  • This series has seasonality and needs to be tranformed.
  • Since these are daily observations annual seasonal component can be removed. But for this s=365 instead of s=12 due to daily observations,
## 
## Coefficients of Original polynomial:  
## 1.4019 -0.5427 0.0695 -0.0085 0.1103 -0.0890 -0.0217 0.2146 -0.1417 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-0.9910B              1.0091               0.9910       0.0000
## 1-1.2524B+0.6857B^2    0.9132+-0.7902i      0.8281       0.1135
## 1-0.2339B+0.6427B^2    0.1819+-1.2340i      0.8017       0.2267
## 1+1.0616B+0.6205B^2   -0.8554+-0.9380i      0.7877       0.3677
## 1+0.7299B             -1.3700               0.7299       0.5000
## 1-0.7162B              1.3962               0.7162       0.0000
##   
## 
## Obs -0.003580677 0.002833143 0.002257824 -0.005517542 -0.009865679 0.005111487 -0.002897585 -0.006194265 0.01766593 -0.01688836 -0.01550766 -0.02972703 0.06277652 -0.02684268 0.0332766 -0.03924441 -0.04111658 0.003847906 -0.04384311 0.001074689 -0.01060938 0.01340347 0.04754802 -0.003186641
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 24
## 
## $chi.square
## [1] 22.54284
## 
## $df
## [1] 14
## 
## $pval
## [1] 0.06812726
## Obs -0.003580677 0.002833143 0.002257824 -0.005517542 -0.009865679 0.005111487 -0.002897585 -0.006194265 0.01766593 -0.01688836 -0.01550766 -0.02972703 0.06277652 -0.02684268 0.0332766 -0.03924441 -0.04111658 0.003847906 -0.04384311 0.001074689 -0.01060938 0.01340347 0.04754802 -0.003186641 0.07430912 -0.03239694 -0.03565677 0.009201041 0.03763682 0.03658233 0.05640757 0.005842422 0.02560925 -0.08143254 -0.04506584 0.03648348 -0.03431505 0.1014667 0.02766491 0.04083197 0.01856858 -0.04007335 0.02468095 0.01147331 -0.04269395 0.06958953 -0.03259759 -0.01289488
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 48
## 
## $chi.square
## [1] 95.11124
## 
## $df
## [1] 38
## 
## $pval
## [1] 8.485069e-07

7.5.2 Stationarize Series

  • Variance is not constant. log transformation needs to be applied before stationarzing series.
  • Remove seasonal component from time series.
  • Since these are daily observations annual seasonal component can be removed. But for this s=365 instead of s=12 due to daily observations,

## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
## [1] 0
## [1] -0.5099709 -0.1575921

7.5.3 Residual diagnositcs

## Obs -0.005321354 -0.01329364 -0.03335789 -0.01055413 -0.04111233 0.05351165 -0.0001298941 0.02857394 0.05623655 -0.00998852 0.01547255 0.02172267 -0.01797564 0.0007880134 0.03221442 -0.03517078 -0.03782465 0.009073126 0.01982738 0.04748212 0.01776086 0.01671858 0.04213314 -0.01442289
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 24
## 
## $chi.square
## [1] 21.93383
## 
## $df
## [1] 22
## 
## $pval
## [1] 0.4638443
## Obs -0.005321354 -0.01329364 -0.03335789 -0.01055413 -0.04111233 0.05351165 -0.0001298941 0.02857394 0.05623655 -0.00998852 0.01547255 0.02172267 -0.01797564 0.0007880134 0.03221442 -0.03517078 -0.03782465 0.009073126 0.01982738 0.04748212 0.01776086 0.01671858 0.04213314 -0.01442289 -0.0006400589 -0.02179692 0.01077715 0.03591711 0.05877482 0.008174739 -0.02892867 0.01619192 -0.0102592 -0.04084097 -0.01085419 0.02591362 -0.03114494 0.06780411 0.03229151 0.06085462 -0.004367754 0.009638848 -0.005170652 -0.02481785 -0.04617717 0.02349056 0.03670645 -0.01562068
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 48
## 
## $chi.square
## [1] 49.18616
## 
## $df
## [1] 46
## 
## $pval
## [1] 0.3467937

7.6 Final Model

Final SO2 Model:

(1 - 1.40B + 0.558\(B^2\) - 0.088\(B^3\) + 0.028\(B^4\) - 0.118\(B^5\) + 0.088\(B^6\) + 0.028\(B^7\) - 0.218\(B^8\) + 0.148\(B^2\))(\(X_t\) - 18.53) = (1 - 0.89B) (\(a_t\))

7.7 SO2 Forecasting

Final SO2 Model:
(1 - 5\(B^{365}\) ) (\(X_t\) - 18.53) = (1 + 0.5B + 0.16\(B^2\) ) (\(a_t\))

  • Hypothesis test on residual passed with strong evidence indicating residuals are white p-value > 0.05 for both K=24 and K=48

## Window ASE

## [1] 477.5533

8 Conclusion

  • For PM2.5 which is correlated to other variables, VAR Model found to be useful.
  • Neural Network takes time but returns best results for univariate.
  • Similar techniques can be used for CO,NO2 forecasting.

9 References

  • Applied Time Series with R, Dr.Wayne Woodword,Henry L. Gray,Alan C. Elliott
  • An article on ENVIRONMENTAL EFFECTS OF AIR POLLUTION AND APPLICATION OF ENGINEERED METHODS TO COMBAT THE PROBLEM, Journal of Industrial Pollution Control.
  • Middle Tennessee Air Quality Forecast , Nashville Health Department.
  • An article on Major Air pollutants.
  • An article on common Air Pollutants and its effects on human beings.
  • World Air Quality Forecast
  • Kaggle article on handling Time series missing values. [1]

no2=artrans.wge(log(tt_no2_ts$train_ts),phi.tr=c(rep(0,356),1))